Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction
نویسندگان
چکیده
Web search result clustering aims to facilitate information search on the Web. Rather than the results of a query being presented as a flat list, they are grouped on the basis of their similarity and subsequently shown to the user as a list of clusters. Each cluster is intended to represent a different meaning of the input query, thus taking into account the lexical ambiguity (i.e., polysemy) issue. Existing Web clustering methods typically rely on some shallow notion of textual similarity between search result snippets, however. As a result, text snippets with no word in common tend to be clustered separately even if they share the same meaning, whereas snippets with words in common may be grouped together even if they refer to different meanings of the input query. In this article we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction. Key to our approach is to first acquire the various senses (i.e., meanings) of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on data sets of ambiguous queries, show that our approach outperforms both Web clustering and search engines.
منابع مشابه
Semantic is Beautiful: Clustering and Diversifying Search Results with Graph-based Word Sense Induction
Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to represent a different meaning of the input query, thus taking into account the language ambiguity issu...
متن کاملClustering Web Search Results with Maximum Spanning Trees
We present a novel method for clustering Web search results based on Word Sense Induction. First, we acquire the meanings of a query by means of a graph-based clustering algorithm that calculates the maximum spanning tree of the co-occurrence graph of the query. Then we cluster the search results based on their semantic similarity to the induced word senses. We show that our approach improves c...
متن کاملUBC-AS: A Graph Based Unsupervised System for Induction and Classification
This paper describes a graph-based unsupervised system for induction and classification. The system performs a two stage graph based clustering where a cooccurrence graph is first clustered to compute similarities against contexts. The context similarity matrix is pruned and the resulting associated graph is clustered again by means of a random-walk type algorithm. The system relies on a set of...
متن کاملFinding Community Base on Web Graph Clustering
Search Pointers organize the main part of the application on the Internet. However, because of Information management hardware, high volume of data and word similarities in different fields the most answers to the user s’ questions aren`t correct. So the web graph clustering and cluster placement in corresponding answers helps user to achieve his or her intended results. Community (web communit...
متن کاملWoSIT: A Word Sense Induction Toolkit for Search Result Clustering and Diversification
In this demonstration we present WoSIT, an API for Word Sense Induction (WSI) algorithms. The toolkit provides implementations of existing graph-based WSI algorithms, but can also be extended with new algorithms. The main mission of WoSIT is to provide a framework for the extrinsic evaluation of WSI algorithms, also within end-user applications such as Web search result clustering and diversifi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Linguistics
دوره 39 شماره
صفحات -
تاریخ انتشار 2013